-
Notifications
You must be signed in to change notification settings - Fork 77
Enable multiple treated units in synthetic control quasi experiments #494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #494 +/- ##
==========================================
+ Coverage 94.59% 95.13% +0.54%
==========================================
Files 28 28
Lines 2053 2384 +331
==========================================
+ Hits 1942 2268 +326
- Misses 111 116 +5 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
…to multicell geolift notebook
Note to selfRelatively happy with where this is at now. I should do more manual inspection of the tests (which were vibe coded). There is definitely scope to remove some conditional branching if we set the likelihood of all models to be 1 dimensions. So it would change from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enables handling multiple treated units throughout the synthetic control workflow by updating documentation, extending the PyMC models, and adding comprehensive multi-unit tests.
- Added
sphinx-togglebutton
for interactive docs - Extended
PyMCModel
andWeightedSumFitter
to accept and process multiple treated units - Updated
SyntheticControl
class and added end-to-end tests for multi-unit scenarios
Reviewed Changes
Copilot reviewed 6 out of 10 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
pyproject.toml | Added sphinx-togglebutton to docs dependencies |
docs/source/conf.py | Registered sphinx_togglebutton extension |
causalpy/tests/test_pymc_models.py | Added fixtures and tests for multi-unit WeightedSumFitter |
causalpy/tests/test_integration_pymc_examples.py | Added fixtures and integration tests for multi-unit SyntheticControl |
causalpy/pymc_models.py | Updated _data_setter , predict , score , and coefficient printing to support multi-unit |
causalpy/experiments/synthetic_control.py | Renamed dims (control_units → coeffs ), parameterized plots and data getters for treated units |
The likelihood of all models is now 2 dimensional. This means we don't have to do conditional branching for single vs multiple treatment units. So we've been able to remove a lot of the code in PyMCModel. This has touched a number of experiment classes which are not related to synthetic control.
That last commit was quite a big one. Model likelihoods are now 2-dimensional. This means we can avoid a lot of conditional branching based on single vs multiple treated unit situations. |
Closes #456
Changes across quite a few files, so I'll give my PR summary and include a copilot generated summary in case it's useful.
Ben's PR summary
The main aim of this PR is to enable analysis of synthetic control experiments with multiple treated units. Previously this quasi experimental situation was kind of possible (and outlined in the Multi-cell geolift analysis notebook), though that worked by literally iterating over each treated unit and running multiple independent analyses.
This PR enables one model to be used when there are either one or multiple treated units. The construction of the model
WeightedSumFitter
still corresponds to an unpooled model, but I am holding off from exploring partial pooling because we have a PR about to be merged which enables user-provided priors (#488) and I'd rather do it using that approach.My initial implementation lead to complex code because the
WeightedSumFitter
model (used for synthetic control) was the only one which had a 2D likelihood (dims of["obs_ind", "treated_units"]
). All the other models had a 1D likelihood (dims of["obs_ind"]
). So there was a lot of branching and dealing with special cases. So another large change introduced by this PR is that all likelihood terms are now 2D. This is why there are code changes beyond theWeightedSumFitter
andSyntheticControl
classes.We've also got additional (vibe-coded, but manually examined) tests to cover both the single and multi treated unit cases for synthetic control.
I've also re-built the UML diagrams and relevant notebooks for the docs. The main changes are in the Multi-cell geolift analysis notebook, which is updated to reflect the new functionality.
Copilot generated PR summary
This pull request introduces changes across multiple causal inference experiment modules to improve compatibility with multi-dimensional data structures and ensure proper handling of single treated units. Key updates include modifying data array dimensions, adding new coordinates, and refining plotting methods to handle single-unit data.
Changes to Data Array Structures:
causalpy/experiments/diff_in_diff.py
: Updatedself.y
andCOORDS
to include a new dimensiontreated_units
for better handling of multi-dimensional data.causalpy/experiments/interrupted_time_series.py
: Modifiedself.pre_y
andself.post_y
to retain 2D shapes and addedtreated_units
as a coordinate. AdjustedCOORDS
for PyMCModel compatibility. [1] [2]causalpy/experiments/prepostnegd.py
: Updatedself.y
andCOORDS
to includetreated_units
for improved data structure handling.causalpy/experiments/regression_discontinuity.py
andcausalpy/experiments/regression_kink.py
: Addedtreated_units
dimension and coordinate toself.y
and updatedCOORDS
. [1] [2]Adjustments to Plotting Methods:
causalpy/experiments/diff_in_diff.py
: Refined plotting methods (_plot_causal_impact_arrow
) to use.isel(treated_units=0)
for single-unit selection in posterior predictive data. [1] [2] [3] [4]causalpy/experiments/interrupted_time_series.py
: Enhanced_bayesian_plot
to handle single treated units in posterior predictive and impact data. [1] [2] [3]causalpy/experiments/prepostnegd.py
: Adjusted_bayesian_plot
to use.isel(treated_units=0)
for posterior predictive data. [1] [2]causalpy/experiments/regression_discontinuity.py
andcausalpy/experiments/regression_kink.py
: Updated_bayesian_plot
to reflect single treated unit handling in posterior predictive data. [1] [2]Compatibility with Single-Unit Data:
causalpy/experiments/interrupted_time_series.py
: Added logic to handle single-unit data format for SKL models and adjusted impact calculations accordingly.causalpy/experiments/synthetic_control.py
: Changed dimensions and coordinates inself.datapre_control
for consistency with other modules.📚 Documentation preview 📚: https://causalpy--494.org.readthedocs.build/en/494/